Overview

Dataset statistics

Number of variables12
Number of observations7110
Missing cells0
Missing cells (%)0.0%
Duplicate rows1075
Duplicate rows (%)15.1%
Total size in memory583.4 KiB
Average record size in memory84.0 B

Variable types

Numeric4
Categorical8

Alerts

TipoTransaccionID has constant value "11" Constant
TipoTransaccionNombre has constant value "Stock Receipt" Constant
ClienteID has constant value "0.0" Constant
InvoiceID has constant value "0.0" Constant
Dataset has 1075 (15.1%) duplicate rowsDuplicates
FechaTransaccion has a high cardinality: 1256 distinct values High cardinality
TransaccionProductoID is highly correlated with CantidadHigh correlation
Cantidad is highly correlated with TransaccionProductoIDHigh correlation
TransaccionProductoID is highly correlated with CantidadHigh correlation
Cantidad is highly correlated with TransaccionProductoIDHigh correlation
ClienteID is highly correlated with TipoTransaccionID and 5 other fieldsHigh correlation
TipoTransaccionID is highly correlated with ClienteID and 5 other fieldsHigh correlation
InvoiceID is highly correlated with ClienteID and 5 other fieldsHigh correlation
NombreProveedor is highly correlated with ClienteID and 5 other fieldsHigh correlation
NombreProducto is highly correlated with ClienteID and 5 other fieldsHigh correlation
TipoTransaccionNombre is highly correlated with ClienteID and 5 other fieldsHigh correlation
ProveedorID is highly correlated with ClienteID and 5 other fieldsHigh correlation
TransaccionProductoID is highly correlated with OrdenDeCompraID and 1 other fieldsHigh correlation
ProductoID is highly correlated with NombreProducto and 3 other fieldsHigh correlation
NombreProducto is highly correlated with ProductoID and 3 other fieldsHigh correlation
ProveedorID is highly correlated with ProductoID and 2 other fieldsHigh correlation
NombreProveedor is highly correlated with ProductoID and 2 other fieldsHigh correlation
OrdenDeCompraID is highly correlated with TransaccionProductoID and 1 other fieldsHigh correlation
Cantidad is highly correlated with TransaccionProductoID and 3 other fieldsHigh correlation

Reproduction

Analysis started2022-06-21 18:24:35.430234
Analysis finished2022-06-21 18:26:31.288176
Duration1 minute and 55.86 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

TransaccionProductoID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct6035
Distinct (%)84.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean211952.4606
Minimum89146
Maximum335846
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size27.9 KiB
2022-06-21T13:26:31.430873image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum89146
5-th percentile101733.45
Q1151949.25
median210032
Q3274549.75
95-th percentile322747
Maximum335846
Range246700
Interquartile range (IQR)122600.5

Descriptive statistics

Standard deviation70957.42693
Coefficient of variation (CV)0.3347799159
Kurtosis-1.205484652
Mean211952.4606
Median Absolute Deviation (MAD)61138.5
Skewness0.01969804336
Sum1506981995
Variance5034956437
MonotonicityNot monotonic
2022-06-21T13:26:31.649623image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3003472
 
< 0.1%
964022
 
< 0.1%
1376602
 
< 0.1%
2368352
 
< 0.1%
2327162
 
< 0.1%
2387192
 
< 0.1%
3157232
 
< 0.1%
1669512
 
< 0.1%
952852
 
< 0.1%
1764312
 
< 0.1%
Other values (6025)7090
99.7%
ValueCountFrequency (%)
891461
< 0.1%
891471
< 0.1%
891481
< 0.1%
891491
< 0.1%
891501
< 0.1%
891511
< 0.1%
891531
< 0.1%
891541
< 0.1%
895581
< 0.1%
895591
< 0.1%
ValueCountFrequency (%)
3358462
< 0.1%
3358451
< 0.1%
3358441
< 0.1%
3358421
< 0.1%
3358411
< 0.1%
3358401
< 0.1%
3358391
< 0.1%
3358381
< 0.1%
3358371
< 0.1%
3355092
< 0.1%

ProductoID
Real number (ℝ≥0)

HIGH CORRELATION

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.4907173
Minimum77
Maximum227
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size27.9 KiB
2022-06-21T13:26:31.849208image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum77
5-th percentile77
Q180
median95
Q3184
95-th percentile204
Maximum227
Range150
Interquartile range (IQR)104

Descriptive statistics

Standard deviation51.43712542
Coefficient of variation (CV)0.4268969973
Kurtosis-1.338092278
Mean120.4907173
Median Absolute Deviation (MAD)17
Skewness0.742631387
Sum856689
Variance2645.777871
MonotonicityNot monotonic
2022-06-21T13:26:32.008529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
78823
11.6%
86817
11.5%
77808
11.4%
204805
11.3%
193804
11.3%
95802
11.3%
98797
11.2%
80785
11.0%
184658
9.3%
2222
 
< 0.1%
Other values (7)9
 
0.1%
ValueCountFrequency (%)
77808
11.4%
78823
11.6%
80785
11.0%
86817
11.5%
95802
11.3%
98797
11.2%
184658
9.3%
193804
11.3%
204805
11.3%
2201
 
< 0.1%
ValueCountFrequency (%)
2271
 
< 0.1%
2261
 
< 0.1%
2251
 
< 0.1%
2242
 
< 0.1%
2232
 
< 0.1%
2222
 
< 0.1%
2211
 
< 0.1%
2201
 
< 0.1%
204805
11.3%
193804
11.3%

NombreProducto
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
"The Gu" red shirt XML tag t-shirt (White) XS
823 
"The Gu" red shirt XML tag t-shirt (White) 5XL
817 
"The Gu" red shirt XML tag t-shirt (White) XXS
808 
Tape dispenser (Red)
805 
Black and orange glass with care despatch tape 48mmx75m
804 
Other values (12)
3053 

Length

Max length55
Median length45
Mean length42.75893108
Min length20

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st rowBlack and orange glass with care despatch tape 48mmx75m
2nd rowBlack and orange glass with care despatch tape 48mmx75m
3rd rowBlack and orange glass with care despatch tape 48mmx75m
4th rowBlack and orange glass with care despatch tape 48mmx75m
5th rowBlack and orange glass with care despatch tape 48mmx75m

Common Values

ValueCountFrequency (%)
"The Gu" red shirt XML tag t-shirt (White) XS823
11.6%
"The Gu" red shirt XML tag t-shirt (White) 5XL817
11.5%
"The Gu" red shirt XML tag t-shirt (White) XXS808
11.4%
Tape dispenser (Red)805
11.3%
Black and orange glass with care despatch tape 48mmx75m804
11.3%
"The Gu" red shirt XML tag t-shirt (Black) XL802
11.3%
"The Gu" red shirt XML tag t-shirt (Black) 4XL797
11.2%
"The Gu" red shirt XML tag t-shirt (White) M785
11.0%
Shipping carton (Brown) 305x305x305mm658
9.3%
Chocolate beetles 250g2
 
< 0.1%
Other values (7)9
 
0.1%

Length

2022-06-21T13:26:32.211650image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
red5637
10.1%
the4832
 
8.7%
shirt4832
 
8.7%
xml4832
 
8.7%
tag4832
 
8.7%
t-shirt4832
 
8.7%
gu4832
 
8.7%
white3235
 
5.8%
black2403
 
4.3%
tape1609
 
2.9%
Other values (32)13934
25.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TipoTransaccionID
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
11
7110 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11
2nd row11
3rd row11
4th row11
5th row11

Common Values

ValueCountFrequency (%)
117110
100.0%

Length

2022-06-21T13:26:32.383528image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-06-21T13:26:32.492901image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
117110
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TipoTransaccionNombre
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
Stock Receipt
7110 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowStock Receipt
2nd rowStock Receipt
3rd rowStock Receipt
4th rowStock Receipt
5th rowStock Receipt

Common Values

ValueCountFrequency (%)
Stock Receipt7110
100.0%

Length

2022-06-21T13:26:32.586657image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-06-21T13:26:32.711656image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
stock7110
50.0%
receipt7110
50.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ClienteID
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
0.0
7110 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.07110
100.0%

Length

2022-06-21T13:26:32.829136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-06-21T13:26:32.931337image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.07110
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

InvoiceID
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
0.0
7110 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.07110
100.0%

Length

2022-06-21T13:26:33.025106image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-06-21T13:26:33.148320image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.07110
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ProveedorID
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
4.0
4832 
7.0
2267 
1.0
 
11

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row7.0
2nd row7.0
3rd row7.0
4th row7.0
5th row7.0

Common Values

ValueCountFrequency (%)
4.04832
68.0%
7.02267
31.9%
1.011
 
0.2%

Length

2022-06-21T13:26:33.249319image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-06-21T13:26:33.367320image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
4.04832
68.0%
7.02267
31.9%
1.011
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NombreProveedor
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
Fabrikam Inc.
4832 
Litware Inc.
2267 
A Datum Corporation
 
11

Length

Max length19
Median length13
Mean length12.69043601
Min length12

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLitware Inc.
2nd rowLitware Inc.
3rd rowLitware Inc.
4th rowLitware Inc.
5th rowLitware Inc.

Common Values

ValueCountFrequency (%)
Fabrikam Inc.4832
68.0%
Litware Inc.2267
31.9%
A Datum Corporation11
 
0.2%

Length

2022-06-21T13:26:33.505318image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-06-21T13:26:33.633393image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
inc7099
49.9%
fabrikam4832
34.0%
litware2267
 
15.9%
a11
 
0.1%
datum11
 
0.1%
corporation11
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

OrdenDeCompraID
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1471
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1345.997328
Minimum602
Maximum2072
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB
2022-06-21T13:26:33.792604image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum602
5-th percentile682
Q1986
median1347
Q31710
95-th percentile1998
Maximum2072
Range1470
Interquartile range (IQR)724

Descriptive statistics

Standard deviation420.3774096
Coefficient of variation (CV)0.3123166748
Kurtosis-1.182190014
Mean1345.997328
Median Absolute Deviation (MAD)362
Skewness-0.01399184079
Sum9570041
Variance176717.1665
MonotonicityNot monotonic
2022-06-21T13:26:34.014596image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
82911
 
0.2%
157910
 
0.1%
149810
 
0.1%
118610
 
0.1%
123710
 
0.1%
131310
 
0.1%
10439
 
0.1%
10289
 
0.1%
16069
 
0.1%
17649
 
0.1%
Other values (1461)7013
98.6%
ValueCountFrequency (%)
6026
0.1%
6032
 
< 0.1%
6046
0.1%
6053
< 0.1%
6066
0.1%
6072
 
< 0.1%
6086
0.1%
6092
 
< 0.1%
6106
0.1%
6113
< 0.1%
ValueCountFrequency (%)
20724
0.1%
20716
0.1%
20704
0.1%
20693
< 0.1%
20686
0.1%
20673
< 0.1%
20667
0.1%
20653
< 0.1%
20647
0.1%
20635
0.1%

FechaTransaccion
Categorical

HIGH CARDINALITY

Distinct1256
Distinct (%)17.7%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
2015-04-06 07:00:00.0000000
 
20
2016-01-04 07:00:00.0000000
 
16
2014-05-19 07:00:00.0000000
 
16
2015-09-28 07:00:00.0000000
 
16
2014-12-29 07:00:00.0000000
 
15
Other values (1251)
7027 

Length

Max length27
Median length27
Mean length21.98171589
Min length11

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)0.6%

Sample

1st rowMay 30,2016
2nd rowMay 31,2016
3rd rowMay 27,2016
4th rowMay 26,2016
5th rowMay 24,2016

Common Values

ValueCountFrequency (%)
2015-04-06 07:00:00.000000020
 
0.3%
2016-01-04 07:00:00.000000016
 
0.2%
2014-05-19 07:00:00.000000016
 
0.2%
2015-09-28 07:00:00.000000016
 
0.2%
2014-12-29 07:00:00.000000015
 
0.2%
2014-11-03 07:00:00.000000015
 
0.2%
2015-09-21 07:00:00.000000015
 
0.2%
2015-02-02 07:00:00.000000015
 
0.2%
2014-11-24 07:00:00.000000015
 
0.2%
2015-03-09 07:00:00.000000015
 
0.2%
Other values (1246)6952
97.8%

Length

2022-06-21T13:26:34.247596image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
07:00:00.00000004880
34.3%
may235
 
1.7%
mar229
 
1.6%
jan226
 
1.6%
apr223
 
1.6%
feb213
 
1.5%
oct170
 
1.2%
dec169
 
1.2%
jun165
 
1.2%
jul156
 
1.1%
Other values (728)7554
53.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Cantidad
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3317
Distinct (%)46.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21758.43826
Minimum10
Maximum67368
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB
2022-06-21T13:26:34.441601image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile120
Q112100
median19776
Q330814.5
95-th percentile43728
Maximum67368
Range67358
Interquartile range (IQR)18714.5

Descriptive statistics

Standard deviation13565.13099
Coefficient of variation (CV)0.6234423091
Kurtosis0.2832102274
Mean21758.43826
Median Absolute Deviation (MAD)8856
Skewness0.6138438548
Sum154702496
Variance184012778.7
MonotonicityDecreasing
2022-06-21T13:26:34.659058image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12054
 
0.8%
7241
 
0.6%
4838
 
0.5%
6037
 
0.5%
9635
 
0.5%
3635
 
0.5%
10830
 
0.4%
1229
 
0.4%
8427
 
0.4%
2426
 
0.4%
Other values (3307)6758
95.0%
ValueCountFrequency (%)
104
 
0.1%
1229
0.4%
206
 
0.1%
2426
0.4%
302
 
< 0.1%
3635
0.5%
402
 
< 0.1%
4838
0.5%
508
 
0.1%
6037
0.5%
ValueCountFrequency (%)
673681
< 0.1%
672721
< 0.1%
672001
< 0.1%
668401
< 0.1%
667441
< 0.1%
666962
< 0.1%
664801
< 0.1%
662881
< 0.1%
659041
< 0.1%
657602
< 0.1%

Interactions

2022-06-21T13:26:11.319150image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:24:43.031000image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:01.573051image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:20.040697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:26:11.511150image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:24:43.268357image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:01.761302image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:28.521059image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:26:11.719647image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:24:43.492820image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:01.976064image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:36.712222image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:26:30.352930image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:01.377633image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:25:19.830889image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-21T13:26:03.012908image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-21T13:26:34.849057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-21T13:26:35.228056image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-21T13:26:35.523877image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-21T13:26:35.779891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-21T13:26:36.015809image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-21T13:26:30.665876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-21T13:26:31.083827image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

TransaccionProductoIDProductoIDNombreProductoTipoTransaccionIDTipoTransaccionNombreClienteIDInvoiceIDProveedorIDNombreProveedorOrdenDeCompraIDFechaTransaccionCantidad
0335504193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2069.0May 30,201667368.0
1335845193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2072.0May 31,201667272.0
2334872193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2067.0May 27,201667200.0
3334385193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2065.0May 26,201666840.0
4333714193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2061.0May 24,201666744.0
5334073193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2063.02016-05-25 07:00:00.000000066696.0
6334073193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2063.02016-05-25 07:00:00.000000066696.0
7333459193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2057.0May 23,201666480.0
8332869193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2055.02016-05-20 07:00:00.000000066288.0
9332332193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.2053.0May 19,201665904.0

Last rows

TransaccionProductoIDProductoIDNombreProductoTipoTransaccionIDTipoTransaccionNombreClienteIDInvoiceIDProveedorIDNombreProveedorOrdenDeCompraIDFechaTransaccionCantidad
710025878577"The Gu" red shirt XML tag t-shirt (White) XXS11Stock Receipt0.00.04.0Fabrikam Inc.1622.02015-09-07 07:00:00.000000012.0
710132053077"The Gu" red shirt XML tag t-shirt (White) XXS11Stock Receipt0.00.04.0Fabrikam Inc.1988.02016-04-11 07:00:00.000000012.0
710233346277"The Gu" red shirt XML tag t-shirt (White) XXS11Stock Receipt0.00.04.0Fabrikam Inc.2058.02016-05-23 07:00:00.000000012.0
710326527677"The Gu" red shirt XML tag t-shirt (White) XXS11Stock Receipt0.00.04.0Fabrikam Inc.1657.02015-09-28 07:00:00.000000012.0
710426527677"The Gu" red shirt XML tag t-shirt (White) XXS11Stock Receipt0.00.04.0Fabrikam Inc.1657.02015-09-28 07:00:00.000000012.0
710533346277"The Gu" red shirt XML tag t-shirt (White) XXS11Stock Receipt0.00.04.0Fabrikam Inc.2058.02016-05-23 07:00:00.000000012.0
7106279657204Tape dispenser (Red)11Stock Receipt0.00.07.0Litware Inc.1741.0Nov 16,201510.0
7107170642204Tape dispenser (Red)11Stock Receipt0.00.07.0Litware Inc.1108.02014-11-03 07:00:00.000000010.0
7108174625204Tape dispenser (Red)11Stock Receipt0.00.07.0Litware Inc.1132.02014-11-17 07:00:00.000000010.0
7109194968204Tape dispenser (Red)11Stock Receipt0.00.07.0Litware Inc.1260.02015-02-02 07:00:00.000000010.0

Duplicate rows

Most frequently occurring

TransaccionProductoIDProductoIDNombreProductoTipoTransaccionIDTipoTransaccionNombreClienteIDInvoiceIDProveedorIDNombreProveedorOrdenDeCompraIDFechaTransaccionCantidad# duplicates
089565193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.605.02014-01-01 07:00:00.000000010200.02
190854204Tape dispenser (Red)11Stock Receipt0.00.07.0Litware Inc.611.02014-01-06 07:00:00.00000005270.02
29112477"The Gu" red shirt XML tag t-shirt (White) XXS11Stock Receipt0.00.04.0Fabrikam Inc.613.02014-01-07 07:00:00.000000010620.02
391131193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.614.02014-01-07 07:00:00.000000010368.02
491132204Tape dispenser (Red)11Stock Receipt0.00.07.0Litware Inc.614.02014-01-07 07:00:00.00000005230.02
591409204Tape dispenser (Red)11Stock Receipt0.00.07.0Litware Inc.616.02014-01-08 07:00:00.00000005300.02
69161778"The Gu" red shirt XML tag t-shirt (White) XS11Stock Receipt0.00.04.0Fabrikam Inc.617.02014-01-09 07:00:00.000000012348.02
79162095"The Gu" red shirt XML tag t-shirt (Black) XL11Stock Receipt0.00.04.0Fabrikam Inc.617.02014-01-09 07:00:00.00000006348.02
891623193Black and orange glass with care despatch tape 48mmx75m11Stock Receipt0.00.07.0Litware Inc.618.02014-01-09 07:00:00.000000010296.02
99181498"The Gu" red shirt XML tag t-shirt (Black) 4XL11Stock Receipt0.00.04.0Fabrikam Inc.619.02014-01-10 07:00:00.000000012804.02